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(54) Replaying video information 

(57) Video replay apparatus comprises a video ma- 
terial store and a replay controller for controlling replay 
of video material from the video material store; the re- 



time - 



play controller being operable to control replay of video 
material stored in the store in accordance with associ- 
ated data defining an information content of the video 
material. 
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Description 

[0001] The present invention relates to the field of re- 
playing video information. 

[0002] Video cameras produce audio and video foot- 
age that will .typically be extensively edited before a 
broadcast quality programme is finally produced. The 
editing process can be very time consuming and there- 
fore accounts for a significant fraction of the production 
costs of any programme. 

[0003] Video images and audio data will often be ed- 
ited "off-line" on a computer-based digital non-linear ed- 
iting apparatus. A non-linear editing system provides the 
flexibility of allowing footage to be edited starting at any 
point in the recorded sequence. The images used for 
digital editing are often a reduced resolution copy of the 
original source material which, although not of broad- 
cast quality, is of sufficient quality for browsing the re- 
corded material and for performing off-line editing deci- 
sions. The video images and audio data can be edited 
independently. 

[0004] The end-product of the off-line editing process 
is an edit decision list (EDL). The EDL is a file that iden- 
tifies edit points by their timecode addresses and hence 
contains the required instructions for editing the pro- 
gramme. The EDL is subsequently used to transfer the 
edit decisions made during the off-line edit to an "on- 
line" edit in which the master tape is used to produce a 
high-resolution broadcast quality copy of the edited pro- 
gramme. 

[0005] The off-line non-linear editing process, al- 
though flexible, can be very time consuming. It relies on 
the human operator to replay the footage in real time, 
segment shots into sub-shots and then to arrange the 
shots in the desired chronological sequence. Arranging 
the shots in an acceptable final sequence is likely to en- 
tail viewing the shot, perhaps several times over, to as- 
sess its overall content and consider where it should be 
inserted in the final sequence. 

[0006] The audio data could potentially be automati- 
cally processed at the editing stage by applying a 
speech detection algorithm to identify the audio frames 
most likely to contain speech. Otherwise the editor must 
listen to the audio data in real time to identify its overall 
content. 

[0007] Essentially the editor has to start from scratch 
and to replay the raw audio frames and video images 
and painstakingly establish the contents of the footage. 
Only then can decisions be made on how shots should 
be segmented and on the desired ordering of the final 
sequence. 

[0008] The invention provides video replay apparatus 
comprising: 

a video material store; 

a replay controller for controlling replay of video ma- 
terial from the video material store; and 
the replay controller being operable to control re- 



play of video material stored in the store in accord- 
ance ^withf associated data defining an information 
content of the video material. 

5 [0009] Various other respective aspects and features 
of the invention are defined in the appended claims. 
[0010] The invention addresses the difficulties de- 
scribed above by providing a new way of replaying (e. 
g. shuttling through) video material. Instead of repjaying 

10 the video material at a constant (or defined) frame rate, 
it is replayed in accordance with information data defin- 
ing an information content of the video material, for ex- 
ample to provide a constant (or user controllable) infor- 
mation rate. 

is [001 1] In this way, the user can view the video mate- 
rial, seeing its most important sections in an efficient 
manner while skimming past sections having a low in- 
formation content, e.g. sections in which little changes 
from frame to frame. 

20 [0012] Preferably, the information replay rate is under 
the control of a user control, for example a jog/shuttle 
wheel. In this case, in a shuttle mode, the angular dis- 
placement of the wheel can control the information, rath- 
er than the frame, replay rate. 

25 [0013] It is noted that a paper by Smith et al, entitled, 
"Video Skimming for Quick Browsing Based on Audio 
and Image Characterisation", Tech Report CMU-CS- 
95-186, Carnegie Mellon Univ.. Pittsburgh, July 1995, 
discloses the production of a shortened version of a 

30 piece of video material in dependence on information 
content, no disclosure has been made of the idea of al- 
lowing a variable information replay rate in accordance 
with a user control. In contrast to the Smith et al paper, 
the invention provides a system in which the rate or type 

35 of information received by the user can be varied at the 
time of replay. 

[0014] Embodiments of the invention will now be de- 
scribed by way of example only with reference to the 
accompanying drawings, in which: 

40 

Figure 1 is a schematic diagram of a first embodi- 
ment of a video replay apparatus; 
Figure 2 is a schematic diagram of a second em- 
bodiment of a video replay apparatus; 
45 Figure 3 shows a downstream audio and video 
processing system; 

Figure 4 shows a video camera and metastore; 
Figure 5 is a schematic diagram of a feature extrac- 
tion module and a metadata extraction module; and 
50 Figure 6 (shown as Figures 6a to 6i) is a schematic 
chart showing information levels within an example 
video sequence. 

[0015] Figure 1 is a schematic diagram of a first em- 
55 bodiment of a video replay apparatus 10, which may 
form part of a video record/replay device, a video editing 
device orthe like. The apparatus receives video material 
(possibly with associated audio material) and stores this 
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in a content storage device 20 such as a tape storage 
. device or (more preferably) a random access storage 
device such as a- hard. disk,-, optical disk or solid ? state 
memory. The replay apparatus 10: also: receives meta- 
data associated with the»video material, and^stores this 
in a "metastore" (metadata store) 30? preferably embod- 
ied as a random access storage device such, as a hard 
disk, optical disk or solid state memory. Of course, it is 
. a routine design detail as to whether the content storage 
and metadata storage are embodied as different devic- 
es or different logical: partitions within a single storage 
device. >" 

, [0016] The derivation and nature of the metadata will 
be described below, but for the purposes of. Figure 1 it 
will be appreciated that the metadata, or data derived 
directly from it, can;give an indication of the instantane- 
ous "information" content of the videorand/or associated 
audio material. The term "information" is used here to 
signify a quality of certain parts of the audio/video ma- 
terial which renders them interesting to a human editor 
or viewer. So, portions of higher information content 
might include video scene changes, the appearance of 
a face (or a new face) in the video material, periods of 
speech, or the start and finish of such periods, in the 
associated audio material, or instances; of changes of 
image "activity" (see below). Other measures of "infor- 
mation" content can be derived, for example those dis- 
cussed in the Smith et al paper referenced above. 
[0017] A replay controller 40 controls replay of the ma- 
terial stored in the content store 20 onto a viewer display 
screen or screen window 50. The replay controller 40 is 
operated under the control of a user control device 60 
which in this embodiment includes a "jog/shuttle" wheel 
but may of course be embodied in a variety of forms 
such as a keypad, a slider, a mouse-driven pointer or 
the like, a touch-screen control and so. on. 
[0018] The operation of a jog/shuttle wheel in a con- 
ventional replay device such as a video tape recorder is 
that, when the wheel is set to a "shuttle" mode, the re- 
play direction depends on the direction that the wheel is 
turned by the user, and the replay speed depends on 
how far the wheel is turned in that direction. Often a 
spring return (back to a zero-speed position) is provided. 
In a "jog" mode, the video is replayed (in either direction) 
by one frame for every incremental rotation (e.g. 1°)of 
the wheel. 

[0019] Often, the user may switch between jog and 
shuttle modes of operation by simply pressing the 
wheel. 

[0020] In the present embodiments, the jog/shuttle 
wheel operates in a generally analogous manner, but 
with respect to the speed at which information is re- 
played or provided to the user, not the speed at which 
video frames are replayed. 

[0021] So, in a "shuttle" mode of information replay, 
the angular displacement of the wheel 60 from the "zero 
speed" centre position determines the rate at which in- 
formation is replayed to the user. Of course, the nature 



of the "information" is to a certain extent subjective, even 
rf the measure of information can, then be derived- ana- 
lytically from the material. However, once the measures 
of information have been set up, the apparatus operates 
5 so that periods in the audio/video material in which very 
little changes or happens (i.e. periods of low information 
content) will be replayed very quickly, and portions of 
the material having a higher information content will be 
replayed more slowly. 
10 [0022] In a "jog" mode, rotation of the wheel 60 by the 
incremental amount causes the material as displayed to 
move to the next point having a high information content, 
for example, .the next point at which the information 
measures exceed a certain threshold. 
15 [0023] The replay controller can simply read data de- 
fining the information content from the metastore, or can 
instead derive it directly from data stored in the metas- 
tore. An example in which the metastore holds the in- 
formation measure data will be described below, but it 
is a routine modification to have some of the calculations 
performed, for example only when needed, by the replay 
controller. 

[0024] Figure 2 schematically illustrates a second em- 
bodiment of a video replay apparatus 10* which in many 
respects is similar to the apparatus 1 0 of Figure 1 . How- 
ever, in Figure 2 the metadata extraction is performed 
by a metadata extractor 70 on the video material stored 
in the content storage device 20. The techniques used 
to extract metadata will be described below with refer- 
ence to Figures 3 to 5, in the context of an apparatus of 
the type shown in Figure 1 . It is a routine modification 
to employ those same techniques in the metadata ex- 
tractor 70 of Figure 2. 

[0025] So, Figures 3 to 5 will describe the acquisition 
of video/audio material and the derivation/extraction of 
metadata and information data from that material into 
the metastore 30. 

[0026] Figure 3 shows an audio-visual processing 
system. A camera 210 records audio and video data on 
video tape in the camera. The camera 210 also produc- 
es and records supplementary information about the re- 
corded video footage known as "metadata". This meta- 
data will typically include the recording date, recording 
start/end flags or timecodes, camera status data and a 
unique identification index for the recorded material 
known as an SMPTE UMID. 

[0027] The UMID is described in the March 2000 is- 
sue of the "SMPTE Journal". An "extended UMID" com- 
prises a first set of 32 bytes of "basic UMID" and a sec- 
ond set of 32 bytes of "signature metadata". 
[0028] The basic UMID has a key-length-value (KLV) 
structure and it comprises: 



■ A 12-byte Universal Label or key which identifies 
the SMPTE UMID itself, the type of material to 
which the UMID refers. It also defines the methods 
by which the globally unique Material and locally 
unique Instance numbers (defined below) are cre- 
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. a ated. v . ■ ••- ;c-r. -%.*'". 

, ■ A 1-byte length value which specifies the length of 
the remaining part of the UMID. - 

■ A 3-byte Instance number used to distinguish be- 
tween different -instances' or copies of material with 
« the same Material number. j. 

■ . A 16-byte -Material number used to identify each 

clip. A Material number is provided at least for each 
shot and potentially for each image frame. . 

[0029] The signature metadata comprises: 

■ An 8-byte time/date code identifying the time of cre- 
ation of the "Content Unit" to which the UMID ap- 
plies. The first 4-bytes are a Universal Time Code 
(UTC) based component. 

■ A 12-byte value which defines the (GPS derived) 
spatial co-ordinates at the time of Content Unit cre- 

. ation. 

■ 3 groups of 4-byte codes which comprise a country 
code, an organisation code and a user code. 

[0030] Apart from the basic metadata described 
above which serves to identify properties of the record- 
ing itself, additional metadata is provided which de- 
scribes in detail, the contents of the recorded audio data 
and video images. This additional metadata comprises 
"feature-vectors", preferably on a frame- by-frame basis, 
and is generated by hardware in the camera 210 by 
processing the raw video and audio data, in real time as 
(or immediately after) it is captured. 
[0031] The feature vectors could for example supply 
data to indicate if a given frame has speech associated 
with it and whether or not it represents an image of a 
face. Furthermore the feature vectors could include in- 
formation about certain image properties such as the 
magnitudes of hue components in each frame. 
[0032] The main metadata, which includes a UMID 
and start/end timecodes, could be recorded on video- 
tape along with the audio and video data,. but preferably 
it will be stored using a proprietary system such as 
Sony's "Tele-File®" system. Under this Telefile system, 
the metadata is stored in a contact-less memory inte- 
grated circuit contained within the video-cassette label 
which can be read, written and rewritten with no direct 
electrical contact to the label. 

[0033] All of the metadata information is transferred 
to the metastore 30 along a metadata data path 215 
which could represent videotape, a removable hard disk 
drive or a wireless local area network (LAN). The metas- 
tore 30 has a storage capacity 230 and a central 
processing unit 240 which performs calculations to ef- 
fect full metadata extraction and analysis. The metas- 
tore 30 uses the feature-vector metadata: to automate 
functions such as sub-shot segmentation; to identify 
footage likely to correspond to an interview as indicated 
by the simultaneous detection of a face and speech in 
a series of contiguous frames; to produce representa- 



tive images for use in an off-line editing system which 
reflect the predominant overall contents of each shot; 
and to calculate properties associated with encoding of 
the audio and video information. 
[0034] Thus the metadata feature-vector information 
affords automated processing of the audio and video da- 
ta prior to editing. Metadata describing the contents of 
the audio and video data is centrally stored in the metas- 
tore 30 and it is linked to the associated audio and video 
data by a unique identifier such as the SMPTE UMID. 
The audio and video data will generally be stored inde- 
pendently of the metadata. The use of the metastore 
makes feature-vector data easily accessible and pro- 
vides a large information storage capacity. 
[0035] The metastore also performs additional 
processing of feature-vector data, automating many 
processes that would otherwise be performed by the ed- 
itor. The processed feature-vector data is potentially 
available at the beginning of the off-line editing process 
which should result in a much more efficient and less 
time-consuming editing operation. 
[0036] Figure 4 illustrates schematically how the main 
components of the video camera 210 and the metastore 
30 interact according to embodiments of the invention. 
An image pickup device 250 generates audio and video 
data signals 255 which it feeds to an image processing 
module 260. The image processing module 260 per- 
forms standard image processing operations and out- 
puts processed audio and video data along a main data 
path 285. The audio and video data signals 255 are also 
fed to a feature extraction module 280 which performs 
processing operations such as speech detection and 
hue histogram calculation, and outputs feature- vector 
data 295. The image pickup device 250 supplies a signal 
265 to a metadata generation unit 270 that generates 
the basic metadata information 275 which includes a ba- 
sic UMID and start/end timecodes. The basic metadata 
information and the feature-vector data 295 are multi- 
plexed and sent along a metadata data path 215. 
[0037] The metadata data path directed into a meta- 
data extraction module 290 located in the metastore 30. 
The metadata extraction module 290 performs full meta- 
data extraction and uses the feature-vector data 295 
generated in the video camera to perform additional da- 
ta processing operations to produce additional informa- 
tion about the content of the recorded sound and imag- 
es. For example the hue feature vectors can be used by 
the metadata extraction module 290 (i.e. additional 
metadata) to perform sub-shot segmentation. This proc- 
ess will be described below. The output data 315 of the 
metadata extraction module 290 is recorded in the main 
storage area 230 of the metastore 30 where it can be 
retrieved by an off-line editing apparatus. 
[0038] Figure 5 is a schematic diagram of a feature 
extraction module and a metadata extraction module 
according to embodiments of the invention. 
[0039] As mentioned above, the left hand side of Fig- 
ure 5 shows that the feature extraction module 280 of 
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the video camera 210, comprises a hue histogram cal- 
culation unit 300, a speech detection unit 31 0 and a face 
detection unit 320. The : outputs of these feature extrac- 
tion units are suppliedto the metadata extraction mod- 
ule 290 for further processing. 

[0040] The hue histogram calculation unit 300 per- 
forms an analysis. of the hue values of each image. Im- 
age-pick-up systems in a camera detect prima ryrcolour 
red -green: and blue (RGB), signals. These; signals are 
formatr.converted and stored in a different colour space 
representation.„On analogue video tape (such as PAL 
•and-NTSG) the signals are stored in YUV space where- 
as digital video systems store the signals in;the standard 
YCrGb colour space. A third colour space is hue-satu- 
ration-value (HSV). ,The hue reflects the dominant 
wavelength of the spectral distribution, the saturation is 
a, measure ofthe concentration of a.spectral distribution 
at ajsingle wavelength and the value is a measure of the 
intensity of the colour. In the, HSV colour space hue 
specifies the colour in a 360° range. 
[0041] The hue histogram calculation unit 300 per- 
forms, if so required, the conversion of audio and video 
data signals from an arbitrary colour space to the HSV 
colour space. The hue histogram calculation unit 300 
then combines the hue.values for the pixels of each 
frame to produce for each frame a "hue histogram" of 
frequency of occurrence as a function of hue value. The 
hue values are in the range 0°< hue < 360° arid the bin- 
size of the histogram, although potentially adjustable, 
would typically be 1°. In this case a feature vector with 
360 elements will be produced for each frame. Each el- 
ement of the hue feature vector will .represent the fre- 
quency of occurrence of the hue value associated with 
that element. Hue values will generally be provided for 
every pixel of the frame but it is also possible that a sin- 
gle hue value will be derived (e.g. by an averaging proc- 
ess) corresponding to a group of several pixels. The hue 
featu rejectors can subsequently be used in the meta- 
data extraction module 290 to perform sub-shot seg- 
mentation and representative image extraction. 
[0042] The speech detection unit 310 in the feature 
extraction module 280 performs an analysis of the re- 
corded audio data. The speech detection unit 310 per- 
forms a spectral analysis of the audio material, typically 
on a frame-by-frame- basis. In this context, the -term 
"frame" refers to an audio frame of perhaps 240 milli- 
seconds duration and not to a video frame. The spectral 
content of each audio frame is established by applying 
a fast Fourier transform (FFT) to the audio data using 
either software or hardware. This, provides a profile of 
the audio data, in terms of power as a function of fre- 
quency. , . V 
[0043] The speech detection technique used in this 
embodiment exploits the fact that human speech tends 
to be heavily harmonic in nature. This is particularly true 
of vowel sounds; Although different speakers have dif- 
ferent pitches in their voices, which can vary from frame 
to frame, the fundamental frequencies of human speech 



will generallyJie in the range from 50-2500 Hz. The con- 
s .:<- tent of the audio data is analysed by applying a series 
of "comb filters" to the- audio data: A comb filter is an 
. Infinite Impulse Response (MR) filter that routes the out- 
5 put samples back to i the input after a specified delay 
time. The comb filter has multiple relatively narrow pass- 
bands, each having a centre frequency at an integer 
multiple of the fundamental frequency associated with 
the particular filter. The output of the comb filter based 
10 on a particular fundamental frequency provides an indi- 
cation of how heavily the audio signal in that frame is 
harmonic about that fundamental frequency! A series of 
comb filters with fundamental frequencies in the range 
50-2500 Hz is applied to the audio data.: 
[0044] When an FFT process is applied to the audio 
material first, as in this embodiment, the comb filter is 
conveniently implemented in a simple selection of cer- 
tain FFT coefficients. 

[0045] The sliding comb filter thus gives a quasi-con- 
tinuous series of outputs, each indicating the degree of 
harmonic content of the audio signal for a particular fun- 
damental audiofrequency. Within this series of outputs, 
the maximum output is selected for each audiotframe. 
This maximum output is known as the "Harmonic Index" 
(HI) and its value is compared with a predetermined 
threshold to determine whether or not the associated 
audio frame is likely to contain speech. 
[0046] The speech detection unit 310 located in the 
feature extraction module 280, produces a feature-vec- 
tor for each audio frame. In its most basic form this is a 
simple flag that indicates ^whether or not speech is 
present. Data corresponding to the harmonic index for 
each frame could also potentially be supplied as feature- 
vector data. Alternative embodiments of the speech de- 
tection unit 310 might output a feature-vector compris- 
ing the FFT coefficients for each audio frame, in which 
case the processing to determine the harmonic index 
and the likelihood of speech being present would be car- 
ried out in the metadata extraction module 290. The fea- 
ture extraction module 280 could include an additional 
unit 330 for audio frame processing to detect musical 
sequences or pauses in speech. 
[0047] The face detection unit 320 located in the fea- 
ture extraction module 280, analyses video images to 
determine whether or not a human face is present. This 
unit implements an algorithm to detect faces such as the 
Facelt® algorithm produced by the Visionics Corpora- 
tion and commercially available at the priority date of 
this patent application. This face detection algorithm us- 
es the fact that all facial images can be synthesised from 
an irreducible set of building elements. The fundamental 
building elements are derived from a representative en- 
semble of faces using statistical techniques. There are 
more facial elements than there are facial parts. Individ- 
ual faces can be identified by the facial elements they 
possess and by their geometrical combinations. The al- 
gorithm can map an individual's identity into a mathe- 
matical formula known as a "faceprint". Each facial im- 
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age can be compressed to produce a faceprint of around 
;84 bytes in size: The face of an individual can <berecog- 
nised from this faceprint regardless of changes in light- 
ning oh skin tone?* facial expressions or; hairstyle and in 
vthe presence or absence of spectacles. Variations in the 
angle of the. face, presented to the' camera can be up to 
around 35?, in all directions and. movement of faces can 
be?tolerated.Y :• , & ..-ir- 

[0048] i The algorithmcan therefore be used to deter- 
mine whether or not a face is present on an image-by- 
image basis and to determine a -sequence of consecu- 
tive images in which the same faceprint appears. The 
software supplier asserts that faces which occupy as lit- 
tle as 1 % of.the image area can be recognised using the 
algorithm. . . • . .» 

[0049] The face detection unit 320 outputs basic fea- 
ture-vectors 355 for each image comprising-ar simple 
flag to indicate whether or not a face has been detected 
in the respective image. Furthermore, -the faceprint data 
for each of the detected faces is output as feature-vector 
data 355, together with a key or lookup table which re- 
lates each image in which at least one face has been 
detected to the corresponding detected faceprint(s). 
This data will ultimately provide the editorwith the facility 
to search through and select all of the recorded video 
images in which a particular faceprint appears. 
[0050] TheTight hand side of Figure 5 shows that the 
metadata extraction module 290 of the video camera 
210, comprises a hue histogram statistics unit 350, an 
"activity" calculation unit 360, a sub-shot segmentation 
unit 370 and a change detector 380. 
[0051] The hue histogram statistics unit 350 uses the 
feature vector data 355 for the hue image property. It 
develops a rolling average of hue histogram data and 
detects changes between a current hue histogram and 
the current value of the rolling average. The rolling av- 
erage can be, for example, an average of one second's 
worth of normal- speed-replayed video. The change de- 
tection can be by means of a single-valued difference 
diffp between the current hue histogram for a frame F 
and the current value of the rolling average. The deriva- 
tion of a single valued difference figure is discussed be- 
low. 1 *-- 4 :v 
[0052] The hue histogram statistics unit 350 can also 
extract a representative image which reflects the pre- 
dominant overall content of a shot. The hue histogram 
data included in features-vector data 355 comprises a 
hue histogram for each image. This feature-vector data 
is combined with the sub-shot segmentation information 
output by sub-shot segmentation unit 370 to calculate 
the average hue histogram data for each shot. 
[0053] The hue histogram information for each frame 
of the shot is used to determine an average histogram 
for the shot according to the formula: 



Ik 



1 n t 



i where i is an index for the histogram bins, h' t is the av- 
erage frequency of occurrence of the hue value associ- 
ated with the ith bin, hj is the hue value associated- with 
10 the ith bin for frame F and n F is the number of frames in 
the shot. If the majority of the frames in the shot corre- 
spond, to the same scene then the hue histograms for 
those: shots will be similar in shape therefore the aver- 
"r age hue histogram will be heavily weighted to reflect the 
15 hue profile of that predominant scene. 

[0054] The representative image is extracted by per- 
forming a comparison between the hue histogram for 
each frame of a shot and the average hue histogram for 
• that shot. A singled valued difference diff F is calculated 
20 according to the formula: 



25 



i nbins 



[0055] For each frame F (1 < F < n F ) of a shot, one 
frame from the n F frames is selected which has the min- 

30 imum value of diff F . The above formula represents the 
preferred method for calculating the single valued dif- 
ference; however it will be appreciated that alternative 
formulae can be used to achieve the same effect. An 
alternative would be to sum the absolute value of the 

35 difference (h'| - h'j), to form a weighted sum of differenc- 
es or to combine difference values for each image prop- 
erty of each frame. The frame with the minimum differ- 
ence will have the hue histogram closest to the average 
hue histogram and hence it is preferably selected as the 

to representative keystamp (RKS) image for the associat- 
ed shot. The frame for which the minimum difference is 
smallest can be considered to have the hue histogram 
which is closest to the average hue histogram. If the val- 
ue of the minimum difference is the same for two frames 

45 or more in the same shot then there are multiple frames 
which are closest to the average hue histogram however 
the first of these frames can be selected to be the rep- 
resentative keystamp. Although preferably the : frame 
with the hue histogram that is closest to the average hue 

50 histogram is selected to be the RKS, alternatively an up- 
per threshold can be defined for the single valued dif- 
ference such that the first frame in the temporal se- 
quence of the shot having a minimum difference which 
lies below the threshold is be selected as an RKS. It will 

55 be appreciated that, in general, any frame of the shot 
having a minimum difference which lies below the 
threshold could be selected as an RKS. The RKS imag- 
es are the output of representative image extraction unit 
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[0056] -The-RKS images can be used in;the off-line 
edit suite as thumbnail images to represent the overall 
.predominant contents of the shots. The editor can see 
the RKS at-a glance and its availability will reduce the 
likelihood of having to replay a given shot in real time. 
[0057] (The "activity" calculation unit 360 uses the hue 
feature-vector data generated by the hue histogram cal- 
culation unit 300 to calculate an activity measure for the 
captured video images. The activity measure gives an 
indication of how much the image sequence changes 
from frame to frame. It can be calculated on a global 
level such as across the full temporal sequence of a shot 
or at a local level with respect to an image and its sur- 
rounding frames. In this embodiment the activity meas- 
ure is calculated from the local variance in the hue val- 
ues. It will be appreciated that the local variance of other 
image properties such as the luminosity could alterna- 
tively be used to obtain an activity measure. The advan- 
tage of using the hue is that the variability in the activity 
measure due to changes in lighting conditions is re- 
duced. A further alternative would be to use the motion 
vectors to calculate an activity measure. 
[0058] The "activity" calculation unit 360 serves to 
measure the activity level in the audio signal associated 
with the video images. It uses the feature-vectors pro- 
duced by the speech detection unit 310 and performs 
processing operations to identify temporal sequences 
of normal speech activity, to identify pauses in speech 
and to distinguish speech from silence and from back- 
ground noise. The volume of the sound is also used to 
identify high audio activity. This volume-based audio ac- 
tivity information is particularly useful for identifying sig- 
nificant sections of the video footage for sporting events 
where the level of interest can be gauged by the crowd 
reaction. 

[0059] The sub-shot segmentation module uses the 
feature vector data 355 for the hue image property to 
perform sub-shot segmentation. The sub-shot segmen- 
tation is performed by calculating the element-by-ele- 
ment difference between the hue histograms for con- 
secutive images and by combining these differences to 
produce a single valued difference. A scene change is 
flagged by locating an image with a single valued -differ- 
ence that lies above a predetermined threshold. 
[0060] Similarly a localised change in the subject of a 
picture, such as the entry of an additional actor to a 
scene, can be detected by calculating the single-valued 
difference between the hue histogram of a given image 
and a hue histogram representing the average hue val- 
ues of images from the previous one second of video 
footage. 

[0061] The change detector 380 detects changes in 
the outputs of the activity calculation unit 360 and the 
sub-shot segmentation unit 370, for example by com- 
paring the current value of a particular metric with a roll- 
ing average of the values corresponding to the last (say) 
one second of normal-replay-speed video. 



[0062] It has been rioted above that the hue histogram 
. . statistics unit 350 detects changes in the hue histogram 
data. By detecting these changes, it is possible to detect 
instances in the video sequence which are perceived to 
5 be more significant or to have a higher information con- 
tent than periods of the video sequence where very little 
changes from frame to frame. 

[0063] Although the description of Figures 3 to 5 has 
covered a system in which the metadata derivation is 

10 partitioned between the camera and a separate 
processing apparatus, the skilled man will of course ap- 
preciate that the metadata derivation could take place 
in a single apparatus or be partitioned between appara- 
tuses in a different manner. 

15 [0064] Figure 6 (shown as Figures 6a to 6t) is a sche- 
matic chart showing information levels within an exam- 
ple video sequence. 

[0065] Figure 6a schematically illustrates a video se- 
quence, with time running from left to right. 

20 [0066] Figure 6b schematically illustrates an activity 
measure within the video sequence, as derived by the 
activity calculation unit 360. Figure 6c schematically 
represents a detection of changes within that activity 
value as detected by the change detector 380. 

25 [0067] Figure 6c represents a speech detection flag 
indicating the likely presence of speech as detected by 
the speech detector 310. Figure 6e schematically indi- 
cates changes in the speech detection flag, as detected 
by the change detector 380. 

30 [0068] Figure 6f schematically illustrates changes in 
the hue histogram data (in this embodiment, changes in 
the single-valued difference value diff F ) with time, as de- 
tected by the hue histogram statistics unit 350. 
[0069] Figure 6g schematically illustrates face flag da- 

35 ta as detected by the face detector 320. Figure 6h sche- 
matically illustrates changes in the face flag data as de- 
tected by the change detector 380. 
[0070] In the present embodiment, it is the change da- 
ta (Figures 6c, 6e, 6f and 6h) which is considered to rep- 

fo resent information content by which the shuttle or jog 
operation is controlled. So, portions of the video/audio 
material in which changes are taking place close to one 
another are replayed more slowly, and portions where 
changes are more sparsely distributed are replayed 

45 more quickly. 

[0071] The replay speed can be set at, for example, 
n information "events" per unit time, where an event is 
defined as at least a threshold change in one of the in- 
formation measures defined above. The "unit time" is 

50 defined by an inverse relationship to the rotary position 
of the shuttle wheel 60, so that as the shuttle wheel 60 
is rotated further the "unit time" becomes shorter. 
[0072] Alternatively, the nature of the replayed infor- 
mation can be changed as the wheel 60 is rotated. For 

55 example, at small angular displacements of the wheel, 
information events where the information content ex- 
ceeds a small threshold can be replayed. As the wheel 
is rotated further the threshold increases until, at large 
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rotations; only the most significant events are replayed. 
This gives-a particular flexibility to the operation of the 
^apparatus. - . - vr ? \ ■ > ? ? - ^ 
:[0073] ; As well as displaying frames at which an infor- 
mation event occurs, the apparatus can be arranged to 
display at least, say, the m frames either side of such a 
frame, or the m frames following such a frame, where 
m is selected so as to allow sufficient frames for the user 
to. comprehend the scene before the replay moves on 
to the next scene. 



Claims 



10. 



10 



content of the video material comprises data defin- 
ing colour content of the video material. ? - 

Apparatus according to any one of the preceding 
claims* comprising an information content analyser 
for deriving the information content data from the 
video material and/or associated audio material. 

A method of replaying stored video material, the 
method comprising the steps of controlling replay 
of video material from the video material store in ac- 
cordance with associated data defining an informa- 
tion content of the video material. 



1. Video replay apparatus comprising: 15 
a video material store; 

a replaycontrollerfor controlling replay of video 
material from the video material store; and 
the replay controller being operable to control 20 
replay of video material stored in the store in 
. accordance with associated data defining an in- 1 3. 

formation content of the.video material. 



11. A method of video replay, the method being sub- 
stantially as hereinbefore described with reference 
to the accompanying drawings. 

12. Computer software having program code for carry- 
ing out a method according to claim 1 0 or claim 1 1 . 



A data providing medium by which computer soft- 
ware according to claim 12 is provided. 



Apparatus according to claim 1, in which the infor- 25 14. 
mation data defines information events within the 
video material, the replay controller being operable 
to replay the video material at so as to give a defined 
rate of information events in the replayed sequence. 

Apparatus according to claim 2, comprising a user 
control, in which the replay controller is operable to 
vary a threshold information content to define an in- 
formation even within the video material, in re- 
sponse to the user control. 35 



A medium according to claim 1 3, the medium being 
a transmission medium. 



1 5. A medium according to claim 1 3, the medium being 
a storage medium. 



6. 



Apparatus according to claim 2, comprising a user 
control, in which the rate of information events is 
defined by the user control. 

Apparatus according to any one of the preceding 
claims, in which the data defining the information 
content of the video material comprises data defin- 
ing the presence of faces within the video material. 

Apparatus according to any one of the preceding 
claims, in which the data defining the information 
content of the video material comprises data defin- 
ing the presence of speech within audio material as- 
. sociated with the video material. 



40 



45 



50 



7. Apparatus according to any one of the preceding 
claims, in which the data defining the information 
content of the video material comprises data defin- 
ing image activity within the video material. 55 

8. Apparatus according to any one of the preceding 
claims, in which the data defining the information 
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